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1  Introduction 

“...Since  all  our  measurements  and  obser¬ 
vations  are  nothing  more  than  approximations 
to  the  truth,  the  same  must  be  true  of  all  calcu¬ 
lations  resting  upon  them,  and  the  highest  aim 
of  all  computations  made  concerning  concrete 
phenomena  must  be  to  approximate,  as  nearly 
as  practicable,  to  the  truth.  ” 

K.  F.  Gauss,  “Theoria  Motus  Corporum 
Coelestium"  (1809). 

Estimation  theory  deals  with  the  problem  of  estimat¬ 
ing  the  state  of  a  stochastic  dynamical  system  from 
noisy  observations.  The  earliest  stimulus  for  its  develop¬ 
ment  was  apparently  provided  by  astronomical  studies  of 
planet  and  comet  motion  in  the  18‘*  century.  The  mo¬ 
tion  of  these  bodies  can  be  completely  characterized  by  a 
finite  number  of  parameters  and  the  estimation  problem 
that  was  considered  was  that  of  inferring  the  value-,  of 
these  parameters  from  telescopic  measurement  data. 

To  be  more  precise,  suppose  ni  measurement  vector 
quantities  y\,. .  ■  ,ym  6  K*  are  available  at  discrete  in¬ 
stants  of  time  <1 , . . .  ,  <m-  The  parameter  vector  r  G  R" 
which  we  wish  to  determine  is  assumed  to  be  linearly- 
related  to  the  measured  data,  i.e. 

(1)  yt  =  Mkx  +  Vk 

where  Vk  represent  the  measurement  errors  that  occur 
at  each  observation  time.  Let  Xm  denote  the  the  esti¬ 
mate  of  X  based  on  the  ni  data  samples  {i/i....  ,ym}- 
Then  the  residual  r*  (1  <  h  <  m)  associated  with  the 
measurement  is  defined  to  be  the  difference  between 
the  observed  value  yt  and  the  value  predicted  from  the 
estimate  Xm : 

(2)  r*  =  yt  -  Mkim 

Around  1795  Karl  Friedrich  Gauss  invented  the  rev¬ 
olutionary  method  of  least-squares  for  attacking  the 
above  problem.  The  method  was  independently  invented 
and  published  by  Legendre  in  1806  in  his  book  “Aou- 
velles  methodes  pour  la  determination  des  orbites  des 
comeies”.  A  detailed  description  of  the  method  was  pub¬ 
lished  by  Gauss  in  1809  in  his  book  “Theoria  Motus  Co-^- 
porum  Coelestium”.  The  name  “least-squares  method” 
comes  from  the  fact  that  the  optimal  estimate  x„,  for  x 
based  on  the  observations  Vm  =  {yi}i<t<m  is  the  value 
of  X  which  minimizes  an  appropriately  weighted  sum  of 
the  squares  of  the  residuals 

m 

(•3)  L,„  =  '^rlWkrk 

ks\ 

where  the  elements  of  the  matrices  M-’t  are  selected  to 
indicate  the  degree  of  confidence  that  one  can  place  on 
the  individual  measurements. 

We  notice  that  the  state/parameter  vector  x  is  as¬ 
sumed  to  be  fixed  in  the  above  description.  Moreover, 
the  least-squares  approach  hats  no  probabilistic  meaning. 

Mt  should  be  noted  that  Gauss  also  considered  a  proba¬ 
bilistic  approach  to  the  estimation  problem,  but  rejected  in 
favor  of  minimizing  function  3. 


The  general  shift  of  attention  to  dynamical  .systems  m 
the  20**  century,  as  well  as  the  introduction  of  the  mai- 
tmum  likelihood  method  by  R.  A.  Fisher  in  1912  led  to 
new  developments  in  the  field  of  estimation  theory.  Kol¬ 
mogorov  in  1941  and  Wiener  in  1942  independently  de¬ 
veloped  a  linear  minimum  mean  square  estimation  tech¬ 
nique  that  received  considerable  attention  and  laid  the 
foundations  for  the  development  of  Kalman  filter  theory 
This  approach  allowed  for  systems  with  state  changing 
with  time,  as  well  as  both  continuous  and  discrete  obser¬ 
vations.  Kolmogorov's  and  Wiener's  work  focuses  on  the 
analysis  and  synthesis  of  systems  in  terms  of  their  input- 
output  characteristics,  reflecting  the  general  trend  in  the 
scientific  community  at  that  time.  The  problems  were 
formulated  in  terms  of  integral  e(|uations  and  the  main 
tools  used  were  the  Laplace  and  Fourier  transforms. 

Subsequent  scientific  developments  have  stressed  the 
state  space  approach,  which  uses  difference  and  differen¬ 
tial  rather  than  integral  equations  for  describing  a  sys¬ 
tem.  Although  both  these  approaches  are  mathemati¬ 
cally  equivalent  the  latter  proved  much  more  convenient 
and  useful,  opening  the  door  to  many  new  developments. 

The  state  space  reformulation  of  the  estimation  prob¬ 
lem  suggested  a  recursive  approach,  first  attempted  for  a 
specific  system  in  1955,  by  J.  W.  Follin  at  John.-,  Hopkin.-- 
University.  Five  years  later,  R.  E.  Kalman  )>nblished  a 
very  influential  paper  ([*20]).  in  which  he  described  an 
optimal  recursive  algorithm  for  solving  the  linear  esti¬ 
mation  problem  using  a  general  state  space  approach. 
This  became  known  as  the  Kalman  filter.  In  a  sense,  the 
Kalman  filter  is  nothing  but  an  efficient  computational 
solution  of  Gauss’s  least-squares  problem  in  a  more  gen¬ 
eral  state  spare  setting. 

The  value  of  Kalman's  reformulation  was  that  it  led  to 
significant  new  insights  and  had  the  effect  of  unifying  ail 
previous  results.  Moreover,  the  Kalman  filter  eciuations 
provided  an  extremely  convenient  procedure  for  digital 
computer  implementation,  and  its  recursive  structure, 
together  with  the  fact  that  the  Kalman  gain  is  inde¬ 
pendent  of  the  observations  and  can  be  precomputed, 
opened  the  door  for  real  time  estimation. 

Nature  is  inherently  nonlinear.  Hence,  in  order  to  ap¬ 
ply  estimation  techniques  to  real  physical  systems,  it  was 
necessary  to  develop  algorithms  that  ran  deal  with  non¬ 
linear  dynamical  systems.  There  are  many  reasons  why 
nonlinear  filtering  theory  is  much  hanier  than  linear  the¬ 
ory.  The  main  difference  is  that  the  linear  filter  has  es¬ 
sentially  a  finite  description;  the  state  of  the  system  con¬ 
sists  of  the  mean  and  the  covariance  matrix,  which  are 
enough  to  completely  determine  the  conditional  proba¬ 
bility  density  that  contains  all  relevant  information.  In 
the  nonlinear  rase,  the  state  of  the  filler  is  infinite,  con¬ 
sisting  of  the  whole  conditional  density  function  used  to 
compute  optimal  estimates.  The  numerical  problems  in¬ 
volved  in  computing  the  whole  probability  density  func¬ 
tion  are,  in  general,  intractable,  since  they  involve  the 
solution  of  complicated  integro-ditferential  equations  or 
functional  integral  equations. 

In  practice,  finite  approximations  to  the  conditional 
probability  density  are  considered.  An  approximate  non¬ 
linear  filter  is  obtained  by  parametrizing  the  conditinnal 


density  via  a  finite  set  of  parameters,  and  computing 
equations  for  the  evolution  of  these  parameters,  which 
comprise  the  state  of  the  system.  One  of  the  most  popu¬ 
lar  approaches  used  in  practice,  is  to  linearize  the  system 
in  question  and  apply  linear  filtering  theory.  This  gives 
rise  to  the  Extended  Kalman  Filter  and  its  variants.  The 
main  characteristic  of  all  these  approaches  is  that  they 
try  to  follow  focsffy  a  nominal  trajectory,  keeping  track  of 
how  the  state  (finite  approximation  to  an  infinite  condi¬ 
tional  probability  density)  changes.  This  local  character 
imposes  serious  limits  to  the  applicability  and  effective¬ 
ness  of  these  algorithms. 

We  notice  that  the  traditional  estimation  algorithms 
are  in  effect  attempts  to  extend  inherently  linear  ideas 
to  the  nonlinear  setting.  One  can  completely  understand 
linear  systems  by  looking  at  isolated  integral  curves,  but 
this  is  hopeless  for  the  nonlinear  case,  because  of  the 
immense  complexity  of  nonlinear  evolution . 

Henri  Poincare  was  the  first  to  realize  that  the  at¬ 
tention  should  be  shifted  from  an  analysis  of  isolated 
trajectories  and  local  characterizations,  to  a  more  global 
topological  and  geometric  understanding  of  the  phase 
space  of  nonlinear  dynamical  systems^.  At  the  present 
time  we  are  witnessing  a  spectacular  blossoming  of  non¬ 
linear  dynamics,  made  possible  on  the  one  hand  by  great 
theoretical  developments  on  global  topological  and  geo¬ 
metrical  analysis,  initiated  by  Poincare’s  revolutionary 
work  in  the  19‘*  century,  and  on  the  other  hand  by  the 
wide  availability  of  increasingly  powerful  computers. 

We  believe  that  very  interesting  new  insights  and  ex¬ 
tremely  accurate  novel  algorithms  can  be  obtained  by 
attacking  parameter  estimation  problems  using  a  global 
geometrical  point  of  view.  Moving  up  one  level  of  ab¬ 
straction  we  wish  to  consider  parameter  estimation  algo¬ 
rithms  whose  primitive  objects  are  geometric  structures 
in  phase  space  (represented  as  points  in  an  appropriate 
function  space),  rather  than  points  on  isolated  trajecto¬ 
ries. 

We  demonstrate  how  to  exploit  the  complexity  of 
global  geometrical  phase  space  structures  of  nonlinear 
dynamical  systems  and  their  dependence  on  parameter 
variations  in  order  to  obtain  extremely  accurate  param¬ 
eter  estimation  algorithms  that  do  not  depend  on  local 
approximations,  in  the  context  of  complex  analytic  dy¬ 
namics. 

in  particular,  the  global  algorithm  was  tested  on  the 
family  of  quadratic  maps  of  the  Riemann  sphere  and  ra¬ 
tional  maps  obtained  from  Newton’s  method  on  complex 
cubic  polynomials.  We  show  how  to  transform  the  esti¬ 
mation  problem  into  a  problem  of  minimizing  a  dissimi¬ 
larity  measure  between  images  containing  global  dynam¬ 
ical  information.  Our  experiments  indicate  that  the  dis¬ 
similarity  function  to  be  minimized  is  locally  unimodai. 
In  fact  it  seems  to  obey  an  exact  power  law  locally.  The 
exponent  appears  to  be  a  new  invariant  of  these  dynami¬ 
cal  systems,  which  we  call  the  parametfr  srnstitvtiy  expo¬ 
nent.  and  which  characterizes  the  dependence  of  global 

^It  i.<i  interesting  to  note  that,  just  like  (iauss'  idea  in 
the  case  of  estimation  theory,  the  motivation  for  Poincare's 
development  of  global  geometrical  dynamics  comes  from  the 
study  of  the  motion  of  celestial  bodies. 


geometric  structures  on  parameter  varialion.s. 

The  global  algorithm  gives  extremely  arcuratf  esti¬ 
mates  for  the  parameters  of  these  system.s  systems  (im¬ 
proving  the  initial  estimate  by  more  than  l:$  orders  of 
magnitude  in  a  certain  case).  Moreover,  appears  to  Ik 
very  robust  with  respect  both  to  observation  uul-..  ami 
dynamical  noise. 

2  Parameter  Estimation  and  Global 
Geometry 

The  study  of  how  geometric  structures  in  pha.se  spare 
change  as  system  parameters  vary  is  of  great  interest 
and  h2is  received  much  attentioti.  The  main  focus  so  far 
has  been  the  study  of  changes  in  the  tojmlogy  of  plia.se 
spare  structures  (bifurcation  theory). 

Our  objective  is  to  exploit  the  way  exact  gconictncnl 
rather  than  topological  features  of  plia-se  space  structures 
change  cis  system  parameters  vary  slightly,  in  order  to 
obtain  novel  extremely  accurate  parameter  eslimaiion 
algorithms  that  do  not  depend  on  local  approximatiotis 

This  approach  led  us  to  the  discovery  of  what  seems  to 
be  a  very  general  power  law  that  enables  us  to  (|uantify 
the  dependence  of  global  geometry  on  small  changes  in 
the  parameters  of  the  system. 

In  order  to  demonstrate  our  approach  we  will  restrict 
our  attention  to  how  basins  of  attraction  change  as  pa¬ 
rameters  vary,  and  show  how  to  transform  the  param¬ 
eter  estimation  problem  into  an  optimization  problem 
over  an  appropriate  space  of  functions  containing  global 
dynamical  information. 

Our  discussion  in  this  paper  will  be  restricted  to  es¬ 
timating  a  single  parameter  of  a  system,  but  our  tech¬ 
niques  can  be  readily  generalized  to  higher  dimensional 
problems. 

We  begin  by  demonstrating  this  approach  for  the  fam¬ 
ily  of  complex  quadratic  polynomials. 

2.1  The  Quadratic  Family 

(liven  any  complex  quadratic  polynomial  p(z)  =  «;■’  ■+ 
2bz  -f  d,  let  M(z)  =  ac  -f  6  and  c  =  nd  b  —  b- .  If 

:  C  — '  C  denotes  the  map  /<•(;)  =  r-’  -p  r,  where  C  is 
the  Riemann  sphere,  then; 

(4)  M-'  ofroM(z)  =  Af'((a.--l-6)-  +  r) 

=  Af~'(n*2'  -p  '206:  -P  6*  -P  r) 

_  (n-'c-  -p  2abz  -P  6-  -P  r)  —  6 
a 

=  P(-) 

i.e.  p  and  fr  are  (analytically)  conjugate.  It  follows 
that  in  order  to  understand  the  dynamics  of  all  complex 
quadratic  polynomials,  it  is  enough  to  umlerstami  the 
dynamics  of  the  complex  one-parameter  family 

c  =  {/;:C-c.cec,yv(.)  =  .--Pri 

Both  the  variable  ;  and  the  parameter  r  till  out  a  com¬ 
plex  plane.  We  will  refer  to  the  r-plane  as  the  dynamtrnt 
plane  and  to  the  r-plane  as  the  parameter  ptam. 


2 


2.1.1  Julia  Sets 

Give^a  rational  map  /  ;  ^  ♦  C  of  the  Riemann 

sphere  C  =  Cu  {oo},  we  can  get  a  dynamical  system  by 
repeated  application  of  /.  In  the  begining  of  the  twen¬ 
tieth  century,  the  French  mathematicians  F*.  Faiou  and 
G.  Julia  studied  the  iteration  of  complex  polynomials  of 
degree  d  >  2.  Having  at  their  disposal  a  powerful  the¬ 
orem  of  Montel  that  gave  a  sufficient  condition  for  the 
normality  of  a  family  of  meromorphic  functions,  they  re¬ 
alized  that  it  is  very  interesting  to  consider  the  following 
decomposition  of  the  dynamical  plane: 

Defiuitiou  2.1  A  point  z  £C  is  an  element  of  the  Fa- 
ton  set,  Ff  of  a  rational  mapping  f,  if  there  exists  a 
neighborhood  U  of  z,  such  that  the  family  of  iterates  {/" } 
is  a  normal  family  on  V .  The  Julia  set  Jj  of  f  is  the 
complement  of  the  Fatou  set. 

where,  /"  denotes  n-fold  functional  composition  of  /  by 
itself. 

Without  recalling  the  exact  definitions,  let  us  only  re¬ 
mark  that  normal  families  bave  values  that  do  not  di¬ 
verge  under  iteration.  So,  in  some  sense  the  Fatou  an_d 
Julia  sets  of  /  are  the  sets  of  stable,  unstable  points  of  C 
with  respect  to  /,  respectively. 

We  notice  that  the  point  at  infinity  oo  is  always  an 
attracting  fixed  point  for  quadratic  maps^.  Let 

j4r(oo)  =  {z  £  C  :  f^  —  OO  as  »j  —  oo} 

be  the  basin  of  attraction  of  infinity.  We  have  the  fol¬ 
lowing  result; 

Proposition  2.1  The  Julia  set  Jj^  of  ff  :  C  —  C. 
ff{z)  =  r'  -I-  c,  is  equal  to  ^i4c(oo). 

The  proof  of  this  statement  is  a  direct  consequence  of 
the  fact  that  the  boundary  of  any  completely  invariant 
component  (here  /l,(oo))  of  the  complement  of  the  Julia 
set  has  to  equal  the  Julia  set. 

The  set  =  C  —  j4r(oo)  is  called  the  filled  in  Julia 
set.  We  clearly  have  dKf  =  J/,,  =  dAf{oc),  i.e.  Jj^ 
separates  competition  between  orbits  being  attracted  to 
00  and  orbits  remaining  bounded  for  all  time. 

Example  1:  (Consider  the  map  /o  ;  C  —  C,  /o(-)  = 
Z-.  (Clearly  any  point  outside  the  unit  circle  .V'  has  an 
orbit  that  escapes  to  infinity.  Moreover,  any  point  in¬ 
side  the  unit  circle  has  an  orbit  converging  to  0.  Conse¬ 
quently,  the  Julia  set  Jj^  of  /o  is  the  unit  circle  .V, 

Example  2:  The  Julia  set  of  the  map  /_2  :  C  —  C, 
/_2(- )  =  —  2  is  the  interval  [—2, 2]  on  the  real  liiie^. 

In  general,  Julia  sets  are  not  smooth,  but  very  com¬ 
plicated  fractal  objects,  exhibiting  an  amazing  variety  of 
geometric  forms  as  the  parameter  c  changes  Figure  I 
(taken  from  [26])  shows  examples  of  Julia  sets  of  complex 
quadratic  polynomials  corresponding  to  various  points  r 
on  the  complex  plane,  along  with  the  Mandelbrot  set 
which  controls  their  topological  structure. 

^In  fact  oc  is  an  ^attracting  fixed  point  for  all  complex 
polynomial  maps  on  C 

^The  proof  is  not  obvious.  See  [6]. 
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2.1.2  The  Mandelbrot  Set 

In  190.5,  P.  Fatou  proved  the  following  very  surprising 
result®; 

Theorem  2.1  Every  attracting  cych  for  a  polynuniinl 
or  rational  function  attracts  at  least  out  critn  al  point 

Each  quadratic  polynomial  ff  in  Q  has  a  unii|ue  crit¬ 
ical  point,  namely  zo  =  0.  The  corresponding  critical 
value  is  ff{zo)  =  /f(0)  =  c.  It  follows  that  /,-  can  have 
at  most  one  attracting  cycle  in  the  complex  plane.  More 
generally,  a  polynomial  of  degree  d  >2  can  have  at  most 
d  —  1  attracting  cycles. 

In  1918-1919  P.  Fatou  and  (L  Julia  proved  another 
result  which  further  supported  their  conjecture  that  the 
dynamical  behavior  is  dominated  by  the  behavior  of  crit¬ 
ical  points: 

Theorem  2.2  Let  ilj  denote  tin  set  of  critical  points 
for  a  polynomial  f  :  C  C.  and  let  l\j  be  the  set  of  all 
points  in  C  which  do  not  escape  to  infinity  under  f.  i.e. 
l\j  =  C  —  j4(oo).  Then: 

1.  ilj  C  l\f  Jj  IS  connected. 

2.  11/0  Kf  =  m  ^  Jj  IS  a  Cantor  Set. 

.Since  for  a  quadratic  map  ff,  there  exists  only  one 
critical  point  namely  ^o  =  O.  an  immediate  corollary  of 
theorem  2.2  is  the  following: 

Corollary  2.1  The  Julia  set  Jj^  of  the  quadratic  map 
ff  IS  either  connected  or  a  Cantor  set.  Moreover.  Jj 
IS  connected  if  and  only  if  ff(D)  does  not  lend  to  oc  as 


The  above  corollary  suggests  a  natural  decomposition 
of  the  parameter  plane  into  the  Mandelbrot  set 

(5)  M  =  {c  £  C  :  J}^  is  connected) 

and  its  complement  C-  M .  Moreover,  corollary  2.1  gives 
us  a  way  to  compute  the  Mandelbrot  set:  in  order  to 
check  whether  a  point  c  of  the  parameter  plane  is  in  M . 
it  is  enough  to  check  whether  the  orbit  of  0  under  fr  does 
not  tend  to  infinity. 

We  remark  that  sets  similar  to  the  Mandelbrot  set  oc¬ 
cur  in  many  other  families  of  complex  analytic  maps. 
This  happens  since  many  mappings  or  their  iterates  lo¬ 
cally  behave  like  a  quadratic  polynomial.  Hence  the 
Mandelbrot  set  is  in  some  sense  a  universal  object 
The  boundary  dM  of  the  Mandelbrot  set  is  a  bifur¬ 
cation  set,  i.e.  the  topological  nature  of  the  Julia  set 
changes  as  we  cross  this  set  in  the  parameter  plane.  In 
the  next  sections  we  will  investigate  how  the  geometry 
rather  that  the  topology  of  Julia  sets  depends  on  param¬ 
eter  variations.  . .  _ 

Asu«**ito«  for 


^The  proof  can  be  found  in  [9]. 
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2.2  Global  Parameter  Estimation  for  the 
Quadratic  Family 

The  complex  one-parameter  family  of  dynamical  systems 

(6)  Sn+l  =  =n  +  <■ 

can  be  considered  as  the  following  real  two-parameter 
family 

( ^ )  ^n+l  “  Vn  “h  A 

(8)  JM+I  =  2i„y„ -t- c 

under  the  usual  identification  of  E*  with  C  sending  z„  to 
Xn+yni  and  c  to  We  want  to  consider  the  problem 

of  estimating  A,  when  ^  is  held  constant,  in  the  presence 
of  observation  and  dynamic  noise.  In  particular,  suppose 
the  real  system  has  noisy  dynamical  evolution: 

(9)  Xn+l  =iC^-y^  +  A-l-Ud.*(«) 

(10)  y„+i  =  '2xny„  +^  +  Vd,y(n) 

and  we  actually  observe; 

(11)  Xn  =  In  + 

(12)  y„  =  yn  +  t)o,y(n) 

where 

both  that  the  dynamical  {t^d,r(«)}n€Z+.  {*'<f,y(”)}n€Z+ 
as  well  as  the  observation  {tJo.T(«)}n€Z+i  (t'o,y(’*)}n€Z+ 
noise  sequences  are  white  Gaussian  random  sequences. 

2.2.1  Setting  up  global  functions  for  quadratic 
maps 

The  primitive  objects  used  in  local  methods  are  points 
on  isolated  trajectories.  In  the  next  section,  we  will  show 
how  to  obtain  extremely  accurate  estimates  of  the  pa¬ 
rameter  A  by  moving  up  one  level  of  abstraction  and 
consider  an  algorithm  that  uses  representations  of  the 
Julia  sets  of  the  maps  as  primitive  objects.  In  this  sec¬ 
tion,  we  describe  how  to  use  proposition  2.1  in  order  to 
obtain  a  djscretje  representation  of  Julia  sets  of  quadratic 
maps  fr  :C  —  C,  /<-(x)  =  z'^  +  c.  The  method  we  will  de¬ 
scribe  is  often  refered  to  as  the  Level  Set  Method  (LSM) 
{[26],  [27]). 

We  restrict  our  attention  to  a  domain  D  C  C  «  R  x  R, 
on  which  we  impose  a  grid  of  n  x  m  cells.  CJhoose  a  large 
integer  Nmar  (iteration  resolution)  and  an  arbitrary  set 
T  (target  set)  containing  oo.  We  will  take  T  =  {z  : 
ll-li  ^  where  R  is  &  large  number®.  For  each  cell 
in  the  decomposition  of  the  domain  D  assign  an  integer 
label  L(p)  =  L(p.T),  where  p  is  the  centerpoint  of  the 
cell,  in  the  following  way; 


the  equipotential  curves  of  the  filled  in  Julia  set  Kr  when 
Jc  is  connected  (see  for  example  [27]). 

As  c  moves  on  the  parameter  plane  where  the  Mandel¬ 
brot  set  lives,  the  corresponding  Julia  set  changes  from 
shape  to  shape  producing  an  immense  variety  of  possible 
geometric  forms  (figure  1 ) 

Let  us  fix  ^  to  the  value  i  =  0..'J.  and  consider  how 
the  geometry  of  the  Julia  sets  changes  as  A  varies  locally 
around  the  value  A  =  —  1.  Figures  2.  3,  4  show  a  window 
of  the  Julia  set  for  A  =  —  1 .0.  A  =  —  1 .00001 .  A  =  —  1 .0001 
respectively.  We  see  that  the  human  eye  can  clearly  sense 
changes  of  the  order  10“'*  or  so  and  tell  which  phase 
image  from  3  and  4  is  closer  to  2.  We  expect  that  by 
comparing  these  images  we  ran  sense  very  small  changes 
in  parameters  and  obtain  extremely  accurate  esliiiialion 
algorithms.  In  the  next  section  we  describe  how  to  turn 
the  above  intuitive  approach  of  comparing  images  to  get 
an  estimate  of  a  parameter  into  a  precise  aigoritliin. 

2.2.2  The  Global  Approach 

In  order  to  be  able  to  store  in  the  computer  and  ma¬ 
nipulate  the  images  that  we  get  by  running  LSM  on  a 
domain  D  C  R  x  R,  we  chose  to  represent  them  as  two 
dimensional  arrays  A  —  Ax(0)  =  {ftfj)i<i<ri,i<j<t7i 

The  array  A  —  Ax(D)  is  a  lookup  table  (a  discrete 
representation)  of  a  function: 

Fx-.D-^R 

which  we  will  refer  to  as  the  global  function  for  fr  on  D. 

Before  we  proceed,  let  us  recall  that  if  (A',  /<)  is  a  mea¬ 
sure  space,  for  p  >  V  =  U’(X,(i)  =  {/  :  A’  — -  IF. 
measurable  such  that  ||/||dp  <  oo).  It  is  a  standard 
result  of  functional  analysis  that  L^(X,n)  is  a  vector 
space  and  ||.||p  is  a  norm  on  L’'{X.fi).  For  /  €  L’’  the 
value: 

(14)  !l/llr  =  (/^.  ' 

is  called  the  LP-norm  of  /.  Let  us  define  the  ^'’-distance 
between  two  functions  f,gE  U’  to  be; 

(15)  dp(/.y)  =  ll/-y|lr 

Let  us  choose  the  domain  ZA  to  be  a  compact  rectangle 
in  E^.  Then  the  function  Fx  G  ,  where  X  =  D  and 
p  is  the  Lebesgue  measure  on  D  C  R‘  The  distance  d,, 
gives  us  a  measure  of  how  different  Fx,  F„  are  for  A,// 
two  different  values  of  the  parameter.  If 

Ax  =  («f,;)l<><n.l<j<T>i 

Afi  ~  (^i.y ) I <y <771 


(13) 

(  k  provided  fe(p)  i  T  and  /*(p)  €  T, 
l^(p)  =  c  for  0  <  i  <  1:  and  k  <  Amar 
1 0  otherwise 

If  lr(p)  is  nonzero  then  p  escapes  to  infinity  and  L(p)  is 
the  escape  Itvie  (measured  in  the  number  of  iterations) 
needed  to  hit  the  target  set  T  around  oc.  The  contours 
obtained  by  the  above  algorithm  are  approximations  of 

®ln  most  of  our  experiments  we  lake  R  =  100. 


are  the  discrete  representations  of  Fx,  F^  respectively,  a 
natural  measure  of  their  difference,  is  the  discrete  U'- 
distance  (p  <  oo), 

(16) 

•  n  ni  *1  l/p 

d,{Ax,A^)  =  HA.  -  A, II,  =  -  a;;,|r 

'•1=1  J=l 

From  now  on  we  will  tend  to  use  the  same  notation  for 
both  the  continuous  quantities  and  their  discrete  repre¬ 
sentations 


Figure  2:  Julia  Set  for  A  =  -1.0,4  =  0.3,  D  =  [-0.079W.5,  -O.OTO.Vi.^)  x  [0.263320, 0.26.53.^]. 
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Figure  3;  Julia  Set  for  A  =  -l.OOOOJ,^  =  0.3,  I)  =  (-0.079r>r)5,  -0.079525]  x  [0.265320, 0.2053.50). 


It  would  seem  that  the  1/  distance  of  global  functions 
F\ .  is  a  rather  coarse  characterization  of  the  differ¬ 
ences  of  the  corresponding  images.  It  turns  out  that  it 
is  ideally  suited  for  doing  parameter  estimation  in  the 
presence  of  noise.  The  reason  is  that,  intuitively,  taking 
the  L’’  distance  of  global  functions  has  the  effect  of  av¬ 
eraging  away  the  effects  of  noise,  returning  a  measure  of 
the  difference  between  the  images  which  i.s  rather  noise 
insensitive . 

(liven  a  parameter  value  A,  let  us  define  the  following 
functional: 

(17)  ii>x:L^^R 

(18)  =  dp(Fx,G) 

Let  us  now  define  the  dtssimiianiy  function  for  parame¬ 
ter  A  to  be  the  function: 

(19) 

(20)  <px(ft)  =  ^>x{Fp)  =  eip{Fx,Fp) 

As  /i  —  A  we  expect  <px(n)  =  dp{Ax.Ap]  —  0.  In¬ 
tuitively,  we  expect  the  dissimilarity  function  to  he 
unimodal'.  locally  around  A  with  a  local  minimum  at 
^t  =  X. 

Our  experiments  not  only  confirm  the  above  conjec¬ 
ture,  but  indicate  that  there  is  a  lot  of  structure  in  the 
way  the  dissimilarity  function  decreases  to  the  local  min¬ 
imum:  an  exact  power  law  is  obeyed  around  A.  More¬ 
over,  local  unimodality  of  fx  around  A  seems  to  be  very 
robust  with  respect  to  dynamical  and  especially  obser¬ 
vation  noise. 

Once  we  know  the  dissimilarity  function  ipx  is  locally 
unimodal,  with  the  real  value  of  the  parameter  A  be¬ 
ing  the  local  minimum,  we  can  use  one  of  the  .standard 
optimization  algorithms  to  determine  A.  In  our  experi¬ 
ments  we  use  the  (lolden  .Section  Method,  described  in 
appendix  A.  This  method  uses  the  fact  that  we  can 
bracket  the  location  of  the  minimum  of  a  unimodal  func¬ 
tion  by  evaluating  the  function  at  two  distinct  points  in 
the  region  L  of  unimodality. 

2.3  Perfurmauce  of  the  Global  Algorithm  for 
the  Quadratic  Family 

In  this  section  we  give  a  list  of  sarnple  r_uns  of  the  global 
algorithm  for  quadratic  maps  fr  :  C  — *  C,fr(:)  = 

If  r  =  A-l-^t  we  want  to  consider  the  problem  of  estimat¬ 
ing  A  assuming  ^  is  fixed,  in  the  presence  of  observation 
and/or  dynamical  noise  (equations  11,  9). 

Let  Fx  be  the  (noisy)  global  function  whose  discrete 
repre.sentation  Ax  is  obtained  by  performing  measure¬ 
ments  (according  to  equation  11)  on  the  noi.sy  real  sys¬ 
tem.  Local  minimization  of  the  dissimilarity  function 

>px  :  K  — >  K 

^A  function  :  P  is  said  to  be  unimodal  on  a  closed 
interval  L  C  R  if  there  is  an  z*  €  L  such  that  z*  iiiiniinizes 
1,5  on  L  and,  for  any  two  points  zi,zj  €  L,  such  that  zi  <  zj 
we  have; 

Z2  <  Z*  =*•  /(Zl)  >  /(Zj) 
z*  <  Zi  =»•  /(zj)  >  /(Zl) 

Note  that  unimodal  functions  are  not  necessarily  differen¬ 
tiable  or  even  continuous. Strictly  convex  functions  and  most 
of  their  generalizations  are  unimodal. 


/a(/‘)  =  di(Fx.  Fp) 

is  obtained  using  the  (iolden  .Section  Method  I’ha.st- 
windows  are  represented  as  .512  x  .512  arrays,  and  the 
Boundary  .Scanning  algorithm  used,  checks  for  escape 
or  orbits  out  of  a  circle  of  radius  100  centered  at  ih>’ 
origin.  The  noisy  phase  portrait  .d;,  corresponds  in  A  == 
—  1.0  and  is  obtained  with  (laussian  observation  noise 
with  variance  <t„  =  10“^  and  no  dynamic  noise  Tin- 
accuracy  achieved  gives  an  upper  bound  on  the  distance 
of  the  global  method  estimate  from  the  real  value  of  the 
parameter. 

•  A  =  0.3 

1.  Domain:  [—1.5.— 1.0]  x  [0.0.  0.5] 

Accuracy  Achieved  :  10~*‘' 

Number  of  Orbit  Points  in  typical  image  : 
4120137 

Average  Number  of  Orbit  Points  per  cell  :  15 

2.  Domain:  [— 1 .445.  —  1 .37]  x  [0.05.  0.2] 

Accuracy  Achieved  ;  10'" 

Number  of  Orbit  Points  in  typical  image  : 
71.50.541 

Average  Number  of  Orbit  Points  per  cell  :  27 

3.  Domain 

[-1.40.507,-1.40.506]  x  [0.1001.55.0.10016.5] 
Accuracy  Achieved  :  10~'^ 

Number  of  Orbit  F'oints  in  typical  image  : 
17446074 

Average  Number  of  Orbit  Points  per  cell  :  66 

4.  Domain 

[-0.079555.-0.079.52.5]  x  [0.26.5320,0.26.53.50] 
Accuracy  Achieved  :  10“''' 

Number  of  Orbit  Points  in  typical  image  : 
33692290 

Average  Number  of  Orbit  Points  per  cell  :  126 

5.  Domain 

[-0.079.54.). -0.079.53.5]  x  [0.265330.0.26.5340] 
Accuracy  Achieved  :  10“'® 

Number  of  Orbit  Points  in  tyjiical  image  ; 
36951016 

Average  Number  of  Orbit  Points  per  cell  :  140 
.  A=0..3.5 

1.  Domain:  [—1.445, —  1.37]  x  [0.05.0.2] 

Accuracy  Achieved  :  10“® 

Number  of  Orbit  Points  in  typical  image  : 
3905.591 

Average  Number  of  Orbit  Points  per  cell  :  15 

2.  Domain:  [- 1 .4075,  - 1 .4070]  x  [0.0992. 0.0997] 
Accuracy  Achieved  ;  10“'" 

Number  of  Orbit  Points  in  typical  image  : 
9058660 

Average  Number  of  Orbit  Points  per  cell  :  35 

3.  Domain 

[-1.40715,-1.40710]  x  [0.099.57.0.09962] 
Accuracy  Achieved  :  10“ " 

Number  of  Orbit  Points  in  typical  image 
12225105 

Average  Ntirnlier  of  Orbit  Points  per  cell  47 

•  A=0.40 


1.  Domain  :  [-1.445,-1.37]  x  [0.05,0.2] 

Accuracy  Achieved  :  10“* 

Number  of  Orbit  Points  in  typical  image  : 

3100158 

Average  Number  of  Orbit  Points  per  cell  12 


1  <  i  <  3,  then  the  value  assigned  to  the  cell  in  ^lne^tlo^ 
is  3it  +  (i  -  1). 

Suppose  p\,pn  are  known.  We  want  to  consider  the 
problem  of  estimating  ps  in  the  presence  of  ohservaiiDii 
and  dynamical  noise.  We  let 


2.4  Cayley’s  Problem  aud  Newtou  Basins 

In  this  section,  we  consider  the  problem  of  estimating 
a  parameter  in  a  dynamical  system  obtained  from  New¬ 
ton’s  method  for  cubic  complex  polynomials. 

Newton’s  method  and  its  variants  are  among  the  most 
prominent  numerical  methods  for  finding  solutions  to 
nonlinear  equations.  From  a  numerical  point  of  view 
Newton’.*;  method  has  always  been  considered  a  local 
method,  :.e  on?  tissumes  that  the  initial  guess  is  suffi¬ 
ciently  close  (o  a  root,  and  then  the  orbit  under  Newton’s 
iteration  scheme  tends  to  this  root. 

hi  1879  A.  C-’ayley,  in  a  one  page  paper  [10].  sug¬ 
gested  the  extension  of  what  he  calls  the  Newton-Fourier 
method 

(21)  ,  = 

applied  to  complex  polynomials  /  :  C  —  C,  where  h  is 
a  real  number.  It  is  interesting  to  note  that  21  can  be 
interpreted  as  the  Euler  method  with  stepsize  h  for  the 
initial  value  problem: 


(22) 


x(t)  = 


(23)  z(0)  =  xo 

Each  of  the  roots  of  /  is  an  attracting  tixeil  jioint  of 
the  Newton-Fourier  iteration.  C-ayley  suggested  .study¬ 
ing  the  metliod  globally,  i.e.  understanding  the  geometry 
of  the  basins  of  attraction  of  the  roots  in  pha.se  space. 

The  proldem  is  easy  in  the  case  of  ((uadratic  (lolyiiomi- 
als:  we  have  seen  that  any  quadratic  map  is  analytically 
conjugate  to  one  of  the  form  /<•(;)  =  -f-  c.  .Newton's 

method  for  /,.  is  a  rational  map  of  degree  2; 

(24)  N(x)  = 

2x 

It  ran  be  shown  that  the  Julia  set  Js  of  A'  is  the  perpen¬ 
dicular  bisector  of  the  segment  joining  the  roots  ±y/c. 
Thus  the  basins  of  attraction  of  the  two  roots  are  the 
half  planes  defined  by  J/v . 

The  geometry  of  the  problem  for  higher  degree  poly¬ 
nomials  is  extremely  complicated.  To  get  a  feeling  for 
why  that  should  be  so,  it  is  enough  to  note  that  we  have 
an  poynomial  /  of  degree  n,  then  if  A,  is  the  ba-sin  of 
attraction  of  the  root  pi  of  /,  we  must  have: 

J{  —  dAi,  i  =  1 , . . .  ,  n 

i.e.  each  point  in  the  Julia  set  Jj  must  touch  simultane¬ 
ously  all  basins  of  attraction.  Figure  5  shows  the  Newton 
basin  portrait  for  the  cubic 

9(*)  =  (i  -  0.5t)(;  +  0.5t)(c  -  1) 

The  values  cissigned  to  each  cell  in  pha.se  spare  corre¬ 
spond  to  convergence  time  to  a  root.  If  k  is  conver¬ 
gence  time  of  the  center  point  of  a  cell  to  p,  where 


-  P\  )(-  -  P2](:  -  p-.i) 

and  define  F,,,  a  to  be  the  Newton  basin  glolial  image 
corresponding  the  polynomial  .Suppose  the  real 

value  for  A  is  A  =  1.0  and  let  us  restrict  A  to  move  on  the 
real  axis.  This  gives  us  a  situation  exactly  analogous  to 
the  one  for  Julia  sets  of  complex  (|uadraiir  majis.  Again 
we  get  an  estimation  algorithm  by  trying  to  minimize: 

CS*  '  1?  I? 

T'/>1  ,P2  • 

F/ii,e2(/*)  —  ( ^(>1  ■  ^<’1 

where  Fp^  p^  x  is  the  noisy  .Newton  basin  global  function 
for  the  third  root  ecjual  to  A. 

2.5  Performance  of  tlie  Global  Algorithm  for 
Newton’s  Basins 

In  this  section  we  give  a  list  of  sample  runs  of  the  global 
algorithm  for  the  case  of  dynamical  systems  obtained  by 
Newton's  method.  Loral  minimization  of  the  dissimilar¬ 
ity  function  <Pp^^p^  where  px  =  0.5?  and  p2  =  -0.5?,  is 
obtained  using  the  Golden  Section  Method.  Global  func¬ 
tions  are  represented  as  512  x  512  arrays.  If  the  number 
of  iterations  it  takes  for  the  renterpoint  of  a  given  cell  to 
enter  a  neighborhood  of  one  of  the  roots,  say  p,  (where 
Pa  =  p  is  the  parameter)  is  k,  then  the  value  assigned  to 
each  the  cell  in  question  is  3/1--K?-  1).  The  noisy  phase 
portrait  Fp,,pj  x  corresponds  to  A  =  1,0  and  is  obtained 
with  (iaussian  observation  noise  with  variance  a,,  =  10“^ 
and  no  ilynamic  noise.  The  accuracy  achieved  gives  an 
upper  bound  on  the  distance  of  the  global  method  esti¬ 
mate  from  the  real  value  of  the  parameter. 

1.  Domain;  [-0.044, -0.024]  x  [-0. 105, -0.08.5] 
Accurary  Achieved  ;  lO'^ 

Number  of  Orbit  F’oints  in  typical  image  :  10538413 
Average  Number  of  Orbit  Points  per  cell  :  40 

2.  Domain 

[-0.033«42.  -0.03.3822]  x  [-0.093942.  -0.093922] 
Accuracy  Achieved  :  10“'‘* 

.Number  of  Orbit  F’oints  in  typical  image:  2590778C 
Average  Number  of  Orbit  F’oints  per  cell  :  99 

3.  Domain 

[-0.033833,  -0.033832]  x  [-0.0939325.  -0.093931-5] 
Accuracy  Achieved  :  lO'*’’ 

.Number  of  Orbit  Points  in  typical  image  ;  3488993C 
Average  Number  of  Orbit  F’oints  per  cell  :  133 

4.  Domain  :  [0.075.0.080]  x  [0.077,0.080] 

Accuracy  Achieved  :  10“” 

Number  of  f)rbit  F’oint.s  in  typical  image  1240428X 
Average  Number  of  Orbit  F’oints  per  cell  :  48 
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Figur**  5:  Newton  Basins  for  f{z)  =  (2  —  0.5j)(2  +  UJ)i)(z  —  I),  and  domain  D  —  [—0.5,  1.5]  x  [—1.0,  l.O] 
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Figure  6:  Quadratic  Family:  Plot  of  6Fx,ix  as  a  func¬ 
tion  of  ||^A||,  6X  >  0,  for  A  =  —1,  ^  =  0.3.  The 
maximum  number  of  iterations  is  100,  and  the  do¬ 
main  is  D  =  {(x,y)  :  x  €  [—0.079555, —0.079525], y  € 
[0.265320.0.265350]}.  A  512x512  cell  resolution  is  used. 
The  real  image  is  measured  with  (laussian  observation 
noise  with  variance  <t<,  =  10“^  and  no  dynamical  noise 
(<7^  =  0).  The  plot  on  the  top  shows  some  sample  points, 
and  the  one  on  the  bottom  is  the  same  plot  with  straight 
lines  connecting  the  sample  points. 


1.  10  1.  10  1.  10  1.  10  1.  10 


Figure  7:  Quadratic  Family;  Plot  of  log(6FA,4A)  as  a 
function  of  log||^A||,  6A  >  0,  for  A  =  -1,  ^  =  0.3. 
The  maximum  number  of  iterations  is  100,  and  the  do¬ 
main  is  =  {(x,y)  ■  X  6  [-0.079555. -0.079.52.5].  y  6 
[0.265320,0.2653.50]}.  A  512x512  cell  resolution  is  used. 
The  real  image  is  measured  with  (iaussian  observation 
noise  with  variance  <r„  =  10“^  and  no  dynamical  noise 
(oj  =  0).  The  plot  on  the  top  shows  some  sample  (loints, 
and  the  one  on  the  bottom  is  the  same  plot  with  straight 
lines  connecting  the  sample  points. 
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Figure  8:  Quadratic  Family:  Plot  of  log{6FA,»A)  as  a 
function  of  log))6A||,  6A  >  0,  for  A  =  —  1,  ^  =  O.H.  Eacli 
line  corresponds  to  a  different  domain.  The  resolution 
increases  from  right  to  left.  The  real  image  is  measured 
with  (iaussian  observation  noise  with  variance  =  10"^ 
and  no  dynamical  noise. 


Figure  9:  Quadratic  Family:  Plot  of  log(6rA.iSA)  as  a 
function  of  log||6A||,  6A  >  0,  for  A  =  —  1 ,  ^  =  0.3.  Each 
line  corresponds  to  a  different  domain.  The  resolution 
increases  from  right  to  left.  The  real  image  is  measured 
with  (iaussian  observation  noise  with  variance  er„  =  10“  * 
and  no  dynamical  noise. 
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Figure  10:  Newton  Basins:  Plot  of  logv7p,,^3(A  +  6A)  as 
a  function  of  ||6A|1,  6X  >  0,  for  pi  =  0.5i,Pi  =  -0.5i, 
where  A  =  1.  The  real  image  is  measured  with  Gaus¬ 
sian  observation  noise  with  variance  <t<,  =  10“^  and 
no  dynamical  noise  {aj  =  0).  The  plot  on  the  top 
shows  some  sample  points,  and  the  one  on  the  bottom 
is  the  same  plot  with  straight  lines  connecting  the  sam¬ 
ple  points.  A  512  x  512  cell  resolution  is  used  over  the 
domain  D  =  {(x,y)  :  x  G  [-0.033842, -0.033822], y  € 
[-0.093942,  -0.093922). 


Figure  11:  Newton  Basins:  fMot  of  logyr^,  ,,j(  A />A)  as 
a  function  of  ||6A||,  ^A  >  0,  for  pj  =  0..5i,p2  =  — 0.5i. 
where  A  =  1.  The  real  image  is  measured  with  (iaussian 
observation  noise  with  variance  <r„  =  10"^  and  no  dy¬ 
namical  noise  ((r^  =  0).  The  plot  on  the  top  shows  some 
sample  points,  and  the  one  on  the  bottom  is  the  same 
plot  with  straight  lines  connecting  the  sample  points.  A 
512  X  512  cell  resolution  is  used  over  the  domain  D  = 
{(x,y)  :  X  6  [-0.044, -0.024],  y  G  [-0.105,-0.08.5]}. 
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3  The  Parameter  Sensitivity  Power  Law 


.  I  *1  ^(j 

1.  10  ‘  1.  10  1.  10 


Figure  12:  Newton  Basins:  Plot  of  +  f>\) 

as  a  function  as  a  function  of  log||iA||,  ^A  >  0.  for 
Pi  =  0.5i,p2  =  — 0.5i,  A  =  1.  Each  line  corresponds  to  a 
different  domain.  The  resolution  increases  from  right  to 
left.  The  real  image  is  measured  with  (iaussian  obser¬ 
vation  noise  with  variance  <t<,  =  10“^  and  no  dynamical 
noise. 


“Let  chaos  storm! 

Let  cloud  shapes  swarm! 

/  watt  for  form.  ” 

R.  Frost.  M  Further  Range:  Ten  Mills.  (\'j 
Peritnai"  (1936). 

The  numerical  experiments  not  only  confirm  tliat  tin’ 
dissimilarity  function  ipx  is  locally  iindiinodal  around  A. 
but  indicate  that  there  is  a  lot  of  structure  in  the  way  it 
decreeises  to  the  local  minimum.  In  particular,  if  Fx  is 
the  global  function  corresponding  to  A.  and  p  =  A  -f  h\. 
then  for  ||6A||  small  enough,  a  power  law  of  the  form. 

(25)  hFx.sx  =  dp(Fx+6x.  Fx)  =  A/||(^A||'* 

is  obeyed.  We  will  refer  to  equation  25  as  the  parame¬ 
ter  sensitivity  power  law. 

Let  us  consider,  for  example,  the  (piadratic  map 
fr{z)  =  z-  +  c.  c  =  X  +  ^i.  Figure  6  is  a  plot  of 
d\(Fx+6x.  Fx)  as  a  function  of  6\.  for  A  =  —I,  ^  =  0.5 
They  shape  of  the  curve  that  we  get  immediately  sug¬ 
gests  a  power  law  around  A  =  —  1  with  exponent  less 
than  1.  If  in  fact  a  power  law  of  the  form  of  eipiation  25 
holds,  then  taking  the  logarithms  of  both  siiles  gives  a 
straight  line  of  slope  d: 

(26)  log^EA.iA  =  dlog||^A|| -I- m 

where  m  =  logAf.  Figure  7  shows  a  log-log  version  of 
figure  6,  which  is  in  fact  a  straight  line.  We  give  the 
following  definitions: 

Defiiiitiou  3.1  Suppose  that  the  local  power  law  holds 
for  some  global  function  Fx  =  Fx.p  D  —  K.  HV  define 
the  parameter  sensitivity  exponent  (p.s.e.)  ■)  of  the  global 
function  Fx  to  be  y  =  I  —  d.  Moreover,  we  define  m  = 
logAf  to  be  the  resolution  factor  of  Fx. 

The  parameter  sensitivity  exponent  •)  is  a  mea.sure  of 
the  performance  of  the  global  algorithm,  since  it  quan¬ 
tifies  our  ability  to  distinguish  nearby  global  functions 
(images).  If  7  =  0,  i.e.  a  linear  power  law  is  obeyed,  the 
global  function  is  rather  insensitive  to  parameter  varia¬ 
tions.  The  performance  of  the  global  algorithm  im|)roves 
as  7  gets  closer  to  1,  i.e.  as  the  slope  d  in  the  log  — log 
plot  decreases  towards  0. 

Looking  at  the  plot  of  logdi(fA+«A.  Fa)  as  a  function 
of  log(|6A||  for  different  domains  of  a  system  and  putting 
the  resulting  plots  together  (for  example  figures  h,  12), 
leads  to  the  conclusion  that  the  slope  d  changes  very  lit¬ 
tle  as  we  change  our  focusing  window,  i.e.  the  domain  D. 
in  phase  space!  Hence  it  makes  sense  to  talk  about  the 
parameter  sensilivitg  exponent  of  the  system.  All  experi¬ 
ments  performed  indicate  that  the  following  conjectures 
are  true: 

Coiyecture  3.1  The  parameter  sensitivity  exponent  of 
a  .system  is  the  same  for  all  typical  domain.s'*  . 
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*By  typical  we  mean  that  the  global  function  Fx  :  P  —  R 
is  representative  of  the  complexity  of  the  system.  For  exam¬ 
ple,  in  the  ca.se  of  connected  Julia  .sets  a  typical  domain  is  any 
domain  near  the  boundary  dAe{rxj).  A  domain  lying  wholy 
in  the  interior  of  the  Julia  set.  in  which  the  global  function 
is  identically  zero,  is  not  a  typical  dom.ain 


Figure  13:  Quadratic  Family:  Plot  of  log(6f'A.iA)  as  a 
function  of  log||6Al|  for  A  =  —0.12  and  ^  =  0.74.  The 
maximum  number  of  iterations  is  100,  and  the  domain  is 
^  =  {(*iy)  •  *  €  [0.3, 0.4], y€  [0.3, 0.4]}.  The  lines  from 
left  to  right  are  for  resolutions  of  512  x  512,  256  x  256. 
128  X  128,  64  X  64  cells. 

Conse(|uently,  the  parameter  seusitivity  expoueut 
seems  to  be  a  uew  dyuamical  system  iuvariaut 
that  quantifies  the  dependence  of  global  geometry**  on 
variations  of  the  parameter. 

Conjecture  3.2  Thr  parameter  sensitivity  exponent  of 
a  system  is  independent  of  the  cell  resolution'^  used 
ill  approximating  the  global  function.  Higher  resolution 
only  increases  the  resolution  factor  m. 

Of  course  the  parameter  sensitivity  exponent  (p.s.e.)  is 
not  exactly  the  same  for  all  resolutions,  because  we  get 
discretization  errors  in  low  resolutions.  The  p.s.e  stays 
close  to  a  dynamical  system  invariant  and  converges  to 
it  as  the  number  of  cells  goes  to  infinity.  An  example  on 
which  conjecture  3.2  is  tested  is  show  in  figure  13. 

To  gain  some  more  insight  into  the  power  law  let  us 
rewrite  equation  25  as: 

p,-,  bF\fx  M  M 

^  6X  -  nail—  -  ||6A|r 

Consider  the  limit 

’or  more  generally,  of  global  information  as  represented 
by  the  global  function  Fx- 

"*The  cell  resolution  is  the  number  of  cells  used  in  the  finite 
representation  of  the  global  function. 


The  above  limit  is  a  mean  denvaitve  in  L'  -norin  of  the 
global  function  Fa  with  respect  to  the  parameter  A  We 
see  that  when  d  =  1,  then  the  limit  exists.  Wlien  d  <  1 
the  limit  28  does  not  converge,  and  the  parameter  sensi¬ 
tivity  exponent  7  =  1  —  rf  measures  its  rale  of  dive rgenri 
Remark:  The  knowledge  that  a  power  law  i.s  obeyed 
locally  around  the  true  value  A  of  the  parameter  can  hr 
used  to  improve  the  performance  (number  of  function 
evaluations)  of  the  global  approach  enormously!  More¬ 
over,  it  can  be  used  to  check  for  errors  and  place  a  safe 
bound  on  the  distance  from  the  real  value  of  the  |)aram- 
eter. 

3.1  The  power  law  for  suioothly  changing 
global  functions 

In  this  section  we  demonstrate  that  if  a  global  function 
is  sufficiently  smooth  with  respect  to  changes  in  the  pa¬ 
rameter,  then  a  linear  power  law  must  be  obeyeil  locally. 
In  particular,  we  prove  the  following  proposition: 

Proposition  3.1  Given  a  global  function  Fx  :  [)  — 
R,  if  for  every  point  r  €  D,  Fa(x)  is  twin  difftnn- 
tiable  with  respect  to  the  parameter  X  and  the  denvativt 
Fx(x)/dX~  IS  continuous  and  bounded  in  a  neighbor¬ 
hood  I’  of  A,  then  a  linear  power  law  is  obeyed  locally 
for  any  U'~norm.  Moreover,  the  U' -denvativt  of  Fx 
with  respect  to  X  exists  and  is  equal  to  the  space  aver¬ 
age  ||dFA/rfA||z,». 

Proof: 

The  proof  is  a  straightforward  application  of  Taylor's 
formula.  Since  Fx{t)  has  a  continuous  second  derivative 
with  respect  to  A,  then  for  any  /<  =  A  +  6A  €  ('  we  have: 

(29)  F,{x)  =  2  +  ^n.An) 

ksU 

m 

F„,,(,/)  =  ^^''(/.-<)"F,"+'(r)dt 

where  F*(r)  =  d‘‘ Fx(r)/dX^  and  n  =  1.  If  A  is  an  upper 
bound  for  F,-(x)  over  U,  then 

(31)  l|£:i.r(/i)<  ^||/.-A|p  =  |||6A|p 

('onsetpiently,  for  ^A  small  enough  we  have 

(.32)  ||Fa+.a(x)  -  FA(r)||  «  ||^^^a|| 

The  Lf-dislance  between  Fa  and  F^  is: 

(33)  ^F(A,/iA)  =  rfp(FA+*A.FA) 

=  (^JjFx+nix)- FxixWdxY' 

Hence,  for  sufficiently  small  ||6A||: 

(34)  6F(A,6A)«  ^'|ia|l 

The  above  linear  power  law  implies  that. 

(35) 


□ 

The  above  proof  not  only  shows  that  a  power  law  holds 
locally,  but  actually  it  holds  pointwise.  The  following 
lemma  gives  us  a  simple  test  for  pointwise  validity  of  the 
power  law. 

Leiiiwa  3.1  If  a  power  law  holds  pointwise  in  norm 
then  the  power  law  with  the  same  exponent  holds 
for  any  norm  V . 

The  proof  is  trivial.  It  turns  out  that  when  the  expo¬ 
nent  of  the  parameter  sensitivity  power  law  is  not  1,  the 
law  does  not  hold  pointwise,  but  is  the  effect  of  spatial 
averaging  in  L^-norm.  Figure  14  gives  an  example  of  a 
system  where  this  is  tested  using  lemma  3.1. 

(  ‘onsider  now  the  linear  differential  equation 

(36)  ^  =  h{X)x 

on  the  real  line  R.  The  solution  of  the  above  equation 
(equation  36)  satisfying  the  initial  condition  r(0)  =  xo 
is  given  by 

(37)  x(t)  =  xoe'‘<^>' 


.Suppose  /i(A)  >  0.  In  this  case,  we  have  a  repelling 
fixed  point  at  x  =  0.  Given  a  domain  D  =  [n,6]  C  R, 
and  we  define  a  function  =  Fx.m  ;  D  — *  R  assigning 
to  each  point  in  the  domain  D  its  escape  time  from  a 
large  interval  Let  us  ignore  the  point  x  =  0, 

where  Fj,(x)  =  0,  since  it  does  not  change  anything  when 
we  integrate  over  it.  For  x  different  from  zero  we  have: 

n  _  InAf  -ln||xl| 

For  most  smooth  functions  y  the  above  global  function 
satisfies  the  requirements  of  proposition  3.1  hence  a  lin¬ 
ear  power  law  is  valid. 

For  example,  if  3(A)  =  f',  we  get 


rf-FA(x) 

dA- 


(In  Af  —  In  |lx||) 


dA2 


=  (In  Af  -  In  ||x||)f  * 


which  is  continuous  and  bounded  for  any  A.  Similarly, 
for  (?(A)  =  A  a  linear  law  holds  around  any  A  different 
from  0. 

The  same  reasoning  applies  to  the  case  when  g(X)  < 
0.  The  only  difference  is  that  Fa(x)  measures  the  time 
needed  to  enter  some  predetermined  neighborhood  of  the 
attracting  fixed  point  0,  instead  of  a  neighborhood  of 
infinity. 

Let  us  now  consider  a  linear  system 

(38)  ^  =  A(X)t 

where  x  €  R".  A  general  solution  to  (38)  can  be  obtained 
by  a  linear  superposition  of  ii  linearly  independent  solu¬ 
tions  {x'(0-  .  x^lt)}: 


2^(0  = 

i  =  i 


where  the  n  unknown  constants  Cj  are  to  be  determined 
by  initial  conditions.  If  .4(A)  has  u  linearly  indepeiuleni 
eigenvectors  ,  1  <  J  <  "  then  we  iiiay  tak>-  a>  .t  b.i'i- 
for  the  spare  of  solutions  the  vector  valued  funciuni' 

(40)  xJ(0  = 

where  Xj  is  the  eigenvalue  associated  with  i''  Lei  u> 
assume  that  al  least  one  of  the  eigenvalues  is  real  and 
positive.  Let  Aj  be  the  largest  of  the  positives  eigenval¬ 
ues.  Then  for  t  very  large  we  have 

x(0  ft: 

Hence  escape  time  from  a  very  large  circle  around  tin- 
origin  is  approximately 


Fx(x) 


ill  M  —  In  llc) !  ’  (I 

a7(A) 


Essentially  the  problem  is  reduced  to  the  one- 
dimensional  problem  considered  above  If  A  is  a  suf¬ 
ficiently  smooth  function  of  A,  the  maxiiinini  eigenvalue 
A)  is  a  sufficiently  smooth  function  of  A  and  the  require¬ 
ments  of  proposition  3.1  are  satisfied.  All  this  ran  be 
made  more  rigorous. 

It  seems  that  almost  all  linear  systems  of  differential 
equations  would  give  rise  to  dynamical  systems  exhibit¬ 
ing  a  linear  parameter  sensitivity  laiv  with  respect  to 
global  functions  measuring  convergence  time  to  the  fixed 
point  and  infinity.  A  more  precise  and  rigorous  theory 
of  global  estimation  on  linear  systems  can  easily  be  de¬ 
veloped. 

In  the  previous  section  we  have  disrus.sed  discrete 
maps  rather  than  continuous  ones.  In  those  cases  time 
is  discrete  and  hence  Fx  might  noi  be  differentiable  with 
respect  to  A.  For  example,  if  we  ke  the  discrete  dy¬ 
namical  system: 


+  l  *  AXfi,  A  ^  1 

then  escape  time  out  of  a  circle  of  radius  A7  is  given  by 


Fx(x) 


■logllfll 

log  A 


FVoposition  31  can  still  be  applied  in  the  sense  that 
the  curves  of  equal  discrete  escape  time  are  just  ap¬ 
proximations  to  curves  of  the  continuous  escape  time 
(r'^ix)  =  log  ll/log  A  which  satisfies  the  requirements 
of  the  proposition. 


3.2  Additioual  Examples 
3.2.1  The  Forced  Peiidiilum 

So  far  we  have  only  seen  discrete  dynamical  systems 
from  complex  analytic  dynamics  exhibiting  a  positive 
parameter  sensitivity  exponent  (p.s.e.).  In  this  section, 
we  provide  an  example  of  a  continuous  time  dynamical 
system  with  high  p.s.e.  which  has  a  completely  differ¬ 
ent  dynamical  structure,  and  enforces  the  belief  for  the 
universality  of  the  parameter  sensitivity  power  law 

In  particular,  we  consider  the  forced  pendulum  de¬ 
scribed  by  the  equation  (see  [3,  23]) 

(41)  -y-T- -)- A— -t- Tfsin ^  =  7  cos/ 


(39) 
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Figure  14:  Plot  of  log(6FA,iA)  vs.  log||6A||  for  the 
quadratic  map  with  c  =  A  +  0.74i,  A  =  -O.l’i,  where 
Fx  is  escape  time  out  of  a  circle  of  radius  K  =  100.  A 
resolution  of  512  x  512  cells  is  used,  and  the  domain  cho¬ 
sen  is  D  =  [0.3, 0.4]  X  [0.3, 0.4).  The  plot  on  the  top  is  a 
display  of  the  data  in  L'-norm.  The  slope  is  d  w  0.54. 
The  plot  on  the  bottom  shows  the  data  in  L^-,  L^-.  L*- 
norm  (from  top  to  bottom)  with  slopes  %  0.23,0.13,0.08 
respectively. 


For  parameter  values  A  =  0.1, =  1.0.-)  =  17.")  the 
system  has  (at  least)  four  attracting  periodic  orbits  (each 
having  period  2^).  For  the  time  2t  1‘oincarp  return  ma). 
these  orbits  are  attracting  fixed  points  located  at 

(42)  Pi  ss  (3.287.0.262) 

(43)  P2  «  (4.301.0.397) 

(44)  P3  ss  (0.0.53,-1.070) 

(45)  P4  %  (0.084.1.608) 

Let  us  assume  the  unknown  parameter  is  A.  for  a  given 
a  domain  D  define  the  global  function  Fx  .  D  —  !?  to 
have  discrete  representation  Ax  assigning  to  each  cell 
the  label  of  the  fixed  point  its  centerpoint  is  attracted 
to.  This  means  that  if  the  centerpoint  p  is  attracted 
to  p,  we  assign  to  the  cell  the  value  i.  This  is  a  very 
coarse  representation  of  a  global  function:  we  do  not 
record  convergence  time  or  any  such  infortiiation:  just 
the  label  of  the  attractor.  Figure  15  shows  a  picture  of 
the  resulting  image  for  a  128  x  128  cell  decomposition  of 
the  domain  D  =  [0,2tr]  x  [—2,4].  All  computations  have 
been  made  with  a  Bulirsch-Stoer  integrator. 

Figure  16  demonstrates  that  the  local  parameter  sen¬ 
sitivity  power  law  is  obeyed  in  a  very  impressive  way 
around  A  =  0.1.  The  parameter  sensitivity  exponent 
seems  to  be  7  =  1  —  d  ss  0.907  This  is  an  enormous 
p.s.e.,  giving  a  global  parameter  estimation  algorithm 
with  very  impressive  performance. 

3.2.2  The  teut  map 

The  next  step  is  to  consider  the  simplest  possible 
systems  for  which  the  parameter  sensitivity  power  law 
holds,  and  for  which  an  analytic  proof  is  possible. 

To  that  end,  consider  the  tent  map: 

/a  =  /  :  /  -  / 


if  r  <  0.5, 
i)  if  j-  >  0.5. 


where  /  =  [0,  1]  is  the  unit  interval.  The  global  function 
Fx  :  D  —  Ik,  D  C  I  that  we  will  consider,  just  like 
in  previous  ettses,  met^ures  escape  time  from  an  interval 
\—M.  M].  in  particular,  we  will  take  M  =  \.  i.e.  measure 
escape  times  from  the  unit  interval. 

If  the  slope  A  is  le.s.*?  than  I ,  then  no  point.s  ever  e.<;ca|)e 
from  the  unit  interval.  Consequently,  Fx(x)  =  0,  for  all 
x£l. 

Let  us  restrict  to  the  case  A  >  1 .  If  we  define  a  to  be 
1/A,  then  we  notice  that  the  points  in  the  interval 

/,(A)  =  [a,l-a] 

escape  after  one  iteration  of  the  tent  map  fx-  Moreover, 
points  in  the  intervals 

/^(A)  =  [o'^a  -  n'%  /2(A)  =  [1  -  (o  -  o^),  1  -  n^] 
escape  after  2  iterations,  and  points  in 
/^(A)  =  [o^o2  - 
/|(A)  =  [o-(o2-o3).o-o2] 
ll(X)  =  [\-Ur 

/5(A)  =  [1  -  (o  -  a^).  1  -  -  (n2  -  o3))] 
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Figure  15:  The  basins  of  attraction  of  the  four  fixed  points  of  the  2x  Poincare  return  map  for  the  forced  pen<luluni 
with  A  =  0.1,/?  =  1.0,7  =  domain  D  =  |0,2»)  x  [-2, 4). 
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10000. 


Figure  16:  F’lot  of  log(iF’x,«x)  vs.  log||6A||  for  the  forced 
pendulum  for  A  =  0.1,/?  =  1.0,7  =  1.75.  The  are  four 
attracting  fixed  points  {pi.pz.pa.p^}  for  the  ‘2ir  Poincare 
return  map.  Fx(x)  is  just  the  label  i  of  the  fixed  point 
to  which  I  is  attracted  (1  <  J  <  4).  A  resolution  of 
128  X  128  cells  is  used,  and  the  domain  chosen  is  D  = 
(0,25r]  X  [—2,4].  L'-norm  is  used  and  the  slope  of  the 
line  is  approximately  0.093!!! 

escape  after  3  iterations.  In  general,  there  are  2*“'  in¬ 
tervals  each  of  which  has  length  /*  =  o*“’(l  —  2o)  con¬ 
taining  points  that  escape  after  k  iterations. 

Figure  17  clearly  shows  that  the  parameter  sensitivity 
power  law  holds,  and  that  the  p.s.e.  (the  slope  of  the 
lines  in  the  log-log  plots)  is  an  invariant  of  the  system 
independent  of  the  resolution. 

3.2.3  A  fractal  curve  boundary  of  attraction 
(liven  1  <  A  <  2,  consider  the  map  ([14],  [18])  Mx  = 
A/  ;  !R  X  ,S''  —  K  X  .S'* ,  defined  by 

(47)  M(x*,fft)  =  (**+!. e*+i) 

where 


Figure  17.  Plots  of  log(6FA,iA)  vs.  logl|6A||  for  the  tent 
map,  with  A  =  3,  where  Fa  is  escape  time  out  of  liie 
unit  interval,  with  a  maximum  number  of  iterations  tol¬ 
erated  equal  to  1000.  The  domain  chosen  is  the  unit 
interval.  The  plots  are,  from  top  to  bottom,  the  data 
with  a  resolution  of  200000, 100000,  .50000,  .30000. 10000 
cells.  The  slope  of  the  linear  segments  is  about  d  »  0.35, 
i.e.  7  a:  0.65. 


(48)  xa+1  =  Ai* -(- cosfl* 

(49)  Ok+i  =  20t  (mod  25r) 

This  map  has  two  attractors  -)-oo  and  -oc  for  the 
first  component,  meaning,  if  /"(zo.^o)  =  (XnJn)  then 
limn—cw  x„  =  ±oo.  M  has  no  finite  attractors  since  the 
eigenvalues  of  the  Jacobian  matrix  are  2  and  A  >  1. 

To  calculate  what  the  boundary  between  the  basins 
of  attraction  j4a(±oo),  we  proceed  as  in  [18].  We  first 
notice  that  given  any  initial  point  (io,^o).  we  have 
0t  =  2*®o.  The  map  M  is  two  to  one  (and  hence  nonin- 
vertible),  but  given  any  point  xs  and  On  =  2'^0o  we  can 


Figure  19;  Plot  of  log(^FA,*A)  vs.  log||AAl|  for  the 
map  47.  where  Fx  's  escape  time  out  of  a  circle  of  ra¬ 
dius  ft  =  150  (maximum  number  of  iterations  tolerated 
is  200).  The  parameter  value  is  A  =  1.5,  a  re.solution 
of  256  X  256  cells  is  used,  and  the  domain  chosen  is 
D  =  [—1.3,  — 1.2]  X  [0.1. 0.2].  The  plots  are.  from  top 
to  bottom,  the  data  in  L' ,  L*,  L^,  norm. 


Figure  18:  Plot  of  log(6FA,«A)  vs.  log||6A||  for  the 
map  47,  where  F\  is  escape  time  out  of  a  circle  of  ra¬ 
dius  ft  =  150  (maximum  number  of  iterations  tolerated 
is  200).  The  parameter  value  is  A  =  1.5,  a  resolution 
of  256  X  256  cells  is  used,  and  the  domain  chosen  is 
D  =  [-1,3, -1.2]  X  (0.1, 0.2]. 


always  select  an  orbit  that  ends  at  (xyv,^/v).  by  taking 
Xi-i  =  A~*xt  —  A"'  cos(2*~'ffo).  This  orbit  starts  at 


N-i 


(50)  xo  =  A-'^xn  -  fos(2'«o) 


The  boundary  between  the  two  basins  y4A(±oo)  are  those 
(xo,9o)  for  which  remains  finite  as  N  —  oc.  So: 


where 


(51) 


dAx(±oo)  =  {(x,ff)  :  X  = 


I 

MO)  = coscre) 

i=0 


Since  A  >  1  the  above  sum  converges  absolutely  and 
uniformly.  In  addition,  the  following  sum 

tsO 

diverges,  because  A  <  2.  Hence  the  curve  fx{6)  is  nondif- 
ferentiable.  Moreover  it  has  been  proved  (in  [21])  fx{^) 
has  fractal  dimension  dc  =  2  —  in  A/  In  2. 

Figure  18  is  evidence  that  the  parameter  sensitivity 
power  law  holds  locally,  and  figure  19  proves  that  the 
power  law  does  not  hold  pointwise. 


4  The  Effect  of  Noise  on  Convergence 

We  notice  that  the  information  obtained  from  each  orbit 
(number  of  iterations  it  takes  to  enter  a  neighborhood 
of  an  attractor)  is  very  insensitive  to  observation  noise, 
i.e.  noise  that  enters  in  the  measurement  equation  (for 
example  equation  11  for  the  quadratic  family)".  More¬ 
over,  observation  noise  is  averaged  away  by  taking  the 
norm  of  the  corresponding  images.  ( lonsequently,  ob¬ 
servation  noise  has  no  important  effect  on  convergence 
properties  of  the  global  method. 

It  is  much  more  interesting  to  see  how  the  dynamic 
noise,  i.e.  noise  that  enters  in  the  dynamic  evolution 
equation,  affects  the  convergence  and  accuracy  of  the 
global  approach.  It  is  very  important  to  check  that  rea¬ 
sonably  small  dynamic  noise  does  not  destroy  conver¬ 
gence  properties  of  the  global  algorithms  because  in  all 
.systems  .some  sort  of  small  dynamical  noise  alway.s  ex¬ 
ists.  In  particular,  all  numerical  simulations  introduce 
dynamic  noise  by  rounding  off. 

Let  us  call  the  the  log-log  plot  of  the  U'  distance  of  the 
image  Fa+<a  generated  for  parameter  value  A  -t-  i A  from 
the  noisy  image  Fx  corresponding  to  the  true  value  A  a-s 
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"We  believe  this  is  a  remarkable  property:  usiiiR  very 
coarse  information  from  each  orbit  (coarse  buildinR  blocks), 
we  build  a  structure  containing  global  inforuiation  wliich  is 
very  delicate  with  respect  to  changes  in  parameters  (fine  over¬ 
all  structure). 


a  function  of  ||6A1|  the  performance  curve  of  the  global 
algorithm. 

Our  experiments  have  shown  that  the  presence  of  dy¬ 
namic  noise  has  the  effect  of  leveling  off  the  performance 
curve  of  the  global  algorithm  to  around  a  value  of  the 
order  of  the  difference  dp(Fx,  F\). 

It  is  very  important  to  note  that  if  the  dynamical  noise 
is  below  a  certain  value,  then  the  local  unimodality  of  the 
dissimilarity  function  is  not  destroyed.  This  is  demon¬ 
strated  by  the  experiments  depicted  in  figures  20  and 
21. 

Intuitively,  we  would  expect  the  presence  of  dynamic 
noise  not  to  be  disastrous,  since  the  results  of  sec¬ 
tions  2. 3, 2. 5  show  that  on  the  average  there  are  from  10 
to  150  iterations  per  brbit,  and  the  dynamic  noise  does 
not  have  the  time  to  manifest  itself  if  it  is  small  enough. 
On  the  contrary,  local  algorithms  attempt  to  follow  a 
nominal  trajectory  for  significantly  longer.  Hence,  the 
presence  of  dynamic  noise  can  have  disastrouss  effects 
on  the  convergence  properties  of  local  algorithms,  espe¬ 
cially  in  the  case  of  chaotic  dynamical  systems. 

5  Combining  Estimation  and  Control 

Let  us  onsider  again  the  problem  of  estimating  the  pa¬ 
rameter  A  in  the  case  of  the  quadratic  system  7,  but  now 
instead  of  assuming  that  ^  is  fixed,  suppose  ^  is  a  con¬ 
trol  parameter  that  we  can  tune.  We  want  to  address  the 
question  of  what  value  of^  gives  optimal  performance  for 
the  global  method  for  estimating  A. 

(Consider  the  domain  D  =  (-1.445,  -1.37]  x  [0.05,0.2], 
and  observe  how  the  performance  plots  of  the  global  al¬ 
gorithm  for  the  given  phase  space  window  change  as  we 
vary  A  (figure  24).  We  notice  that  as  we  approach  the 
boundary  of  the  Mandelbrot  set  the  global  algorithm 
reaches  better  and  better  accuracy,  increasing  the  pa¬ 
rameter  sensitivity  exponent  (p.s.e.)  7  =  1  -  d  for  the 
domain.  So  our  experiments  suggest  that  we  ran  max¬ 
imize  the  p.s.e.  7  and  hence  optimize  performance  of 
the  global  algorithm  by  choosing  ^  so  as  to  minimize  the 
distance  of  c  =  X  +  ^i  from  the  boundary  of  Mandelbrot 
set  M. 

We  remark  that  we  should  be  careful  about  the  above 
statement  since  the  distance  to  the  Mandelbrot  set  is  not 
the  only  thing  that  controls  7:  it  matters  a  lot  which 
part  of  the  Mandelbrot  set  we  are  close  to.  (onsider  for 
example  varying  A  close  to  the  point  c  =  —0.75  -t-  0? 
(the  round  top  of  the  main  c.ardioid  of  the  Mandelbrot 
set).  The  point  c  =  —0.75  is  on  the  boundary  of  the 
Mandelbrot  set,  but  tis  we  move  A  along  the  real  axis  we 
move  inside  the  Mandelbrot  set  obtaining  a  parameter 
sensitivity  exponent  (p.s.e.)  of  7  a:  1  —0.687  =  0.313.  A 
much  bigger  p.s.e.  (7  %  1  —  0.218  =  0.782)  is  obtained 
when  we  are  close  to  the  boundary  of  M  on  the  vertical 
line  A  =  — 1.0. 

We  have  seen  that  the  boundary  of  the  Mandelbrot 
set  is  a  bifurcation  set  for  the  quadratic  family,  since  the 
topology  of  the  Julia  set  changes  as  we  cross  it.  We  con¬ 
jecture  that  for  more  general  systems  we  can  get  optimal 
performance  out  of  a  global  estimation  algorithm  by  tun¬ 
ing  control  parameters  so  as  to  drive  the  system  near  a 
bifurcation  set  in  parameter  space.  Intuitively  we  expect 


Figure  20:  Quadratic  Family;  IMots  of  log(i!iFA.4A)  as 
a  function  of  log|(<iA(|,  S\  >  0,  for  ^  =  0.3,  for  varius 
strengths  of  dynamic  noise,  and  fixed  oliservation  noise 
with  (T„  =  10”^.  The  curves  (from  top  to  bottom)  cor¬ 
respond  to  dynamic  noise  erj  =  10“’’,  lO"'",  10“’". 

The  maximum  number  of  iterations  is  100,  and  the  do 

main  is  I)  =  {(x,i/)  :  x  €  [—0.079555, —0  079525],  j/  € 
[0.2(55320. 0.265350]}.  A  512x512  cell  re.soliition  is  used. 
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Figure  21:  Quadratic  Family:  Plots  of  log(6FA,«A)  as 
a  function  of  log||($A||,  >  0,  for  ^  =  0.3,  for  varius 

strengths  of  dynamic  noise,  and  fixed  observation  noise 
with  (T„  =  lO"®.  The  curves  (from  top  to  bottom)  cor¬ 
respond  to  dynamic  noise  aj  =  10~®,  10"*,  lO"'”.  The 
maximum  number  of  iterations  is  100,  and  the  domain 
is  D  =  {(a:,y)  :  a-  €  [-1.445, -1.37],  y  G  [0.05,0.2]}  A 
512  X  512  cell  resolution  is  used. 


to  have  much  more  sensitivity  to  changes  in  paraiueiers 
(high  p.s.e.)  as  we  get  close  to  bifurcations 

We  believe  it  would  be  very  interesting  tn  (ievelt)ji  a 
general  theory  of  optimal  global  geonieirir  esumatioii 
through  control  and  in  general  analy2r'  tin  di  in  iulri .  ,,f 
the  the  parameter  sensitivity  exponent  on  the  iHisiiion 
m  parameter  space. 

5.1  A  Global  Geometric  Controller  fur  the 
Quadratic  Family 

In  this  section  we  will  show  how  to  use  a  result  of  .1 
Milnor  and  W'.  Thurston  [25,  27]  for  bounding  the  di.-,- 
tance  to  the  Mandelbrot  set  A/,  to  obtain  a  global  geo¬ 
metric  controller  for  the  quadratic  family.  Our  method 
for  determining  the  control  ^  value  gives  an  astoni.sli- 
ingly  good  performance,  improving  the  initial  estimate 
by  more  than  13  orders  of  magnitude  (figure  2()).  and 
improving  the  best  estimate  that  we  had  for  i  =  U.3  by 
more  than  7  orders  of  magnitude  for  the  given  domain. 

5.1.1  Bouudiug  the  distance  to  A/ 

J.  Milnor  and  W’,  Thurston  proved  a  bound  for  the 
distance  d(c,M)  of  a  point  c  in  the  parameter  plane 
from  the  Mandelbrot  set  M.  on  which  some  algorithms 
for  producing  pictures  of  the  Mandelbrot  set  are  based. 
In  particular,  they  have  shown  the  following  result 
(see  [27]): 

Theorem  5.1  If  c  a  point  outside  of  M .  then 


sinh  C!(c) 
2expf/(r)|K;'(c)|| 


<  d(c,M)  < 


2sinh(>'(r) 


Similar  inequalities  can  be  obtained  for  points  inside  the 
Mandelbrot  set,  as  well  as  for  the  distance  of  points  in 
the  dynamical  plane  from  connected  Julia  sets. 

Let  us  approximate,  somewhat  arbitrarily,  the  dis¬ 
tance  of  a  point  r  outside  of  M ,  by  the  estimate  of  the 
upper  bound  in  inequality  52,  i,e,  let 


d(c,M) 


2  sinh  (r(r) 

l|O''(0ll 


For  c  near  A/,  we  may  approximate  sinh(i(r)  by  (He). 
A  further  approximation  to  f'(r)  gives: 


rf(r,A/)  w2] 


log||-n| 


where 


5.1.2  The  Controller 

Using  the  approximation  54  it  is  easy  to  build  a  pro¬ 
gram  that  minimizes  the  distance  to  A7.  First  of  all  in 
order  to  obtain  an  estimate  for  the  distance  of  c  from 
A7,  iterate: 

Ct+i  =  fri^k)  =  c*  -I-  r,  cu  =  0, =  0,  1 , 2.  .  .  . 

until  either  ||c*  +  i)|  >  R  (where  R  is  large)  or  k  =  A’,naj- 
where  Nmar  is  the  maximum  number  of  iterations  that 
we  allow  (we  take  A'„,ox  =  1000).  If  we  stop|)ed  at 
k  =  A’mar  we  let  d(c,X1)  =  0.  If  we  have  stojiped  at 
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Figure  25:  Plot  of  log(6FA,«x)  vs.  log||6A||  for  the 
quadratic  map  with  f  =  A  +  ^i,  ^  =  0.  where  Fx  is 
escape  time  out  of  a  circle  of  radius  R  —  100.  and  the 
cell  resolution  is  256  x  256.  The  real  global  function  is 
at  A  =  —0.75,  and  the  slope  is  approximately  0  687. 


II  <  A'mon  then  having  saved  the  orbit  {;o . ^n}  we 

compute: 

4+1  =2.-tr;  +  1.4  =  0.i-  =  0,l . 11-  1 

If  we  get  an  overflow  in  the  course  of  this  iteration,  then 
this  means  that  c  is  very  close  to  M  and  we  return 
d{c,M)=  —1.  Otherwise,  we  return 


Figure  24:  Plot  of  log(6F'A,«A)  vs.  log||6A||  for  the 
quadratic  map  with  r  =  A  +  ^i,  where  Fx  is  escape  time 
out  of  a  circle  of  radius  R  =  100,  and  the  cell  resolution 
is  512  X  512.  The  real  global  function  is  at  A  =  —  I,  and 
is  meeisured  with  Gaussian  observation  noise  with  vari¬ 
ance  (To  =  10”^  and  no  dynamical  noise  {(Tj  =  0).  From 
top  to  bottom,  the  lines  correspond  to  ^  =  0.3,  ^  =  0.25. 
^  =  0.35.  ^  =  0.2,  ^  =  0.4. 


rf(c,M)  =  2jj^log||c„|| 

A  simple  program  implementing  this  is  shown  in  flg- 
ure  27. 

The  optimal  control  value  ^  is  the  value  for  which  the 
distance  estimate  d(A  -I-  A/)  is  minimized.  Figure  2S 

shows  a  listing  of  the  very  simple  program  that  achieves 
the  value  of  q  which  minimizes  the  estimate 

(MSetDist  (make-recteaigulax  p  q)  maxiter  R 
overflow) 

for  given  values  of  maxiter,  R,  overflow 

We  first  fix  a  value  of  ^  and  obtain  an  estimate  for 
A  to  feed  the  function  optimal-control  (figure  5.1.2) 
by  running  a  local  algorithm  (for  example  an  extendeil 
Kalman  filter).  The  local  algorithm  (implemented  by 
Elmer  Hung  [17])  that  we  have  insed  gave  an  accuracy 
of  10“^  —  10"*’.  The  function  optimal-control  will 
return  a  value  q  for  f  who.se  distance  from  the  trully 
optimal  value  will  be  of  the  order  of  10”''  -  10"'  Ue 
can  then  drive  F  to  the  estimate  q  and  run  the  global 


Figure  26:  Plot  of  log(6FA,<A)  vs.  log||6A||  for  the 
quadratic  map  with  r  =  A  +^i,  where  Fx  is  escape  time 
out  of  a  circle  of  radius  R  =  100,  and  the  cell  resolution 
is  512  X  512.  The  real  global  function  is  at  A  =  —1.  and 
is  measured  with  (iaussian  observation  noise  with  vari¬ 
ance  <r„  =  10“^  and  no  dynamical  noise  {cd  =  0).  From 
top  to  bottom,  the  lines  correspond  to  ^  =  0.2d65l2.'}7 
(d  »  0.218)  (,  =  0.3  (d  «  0.35),  f  =  0.25  (d  w  0.48). 
^  =  0.35  (d  «  0.59),  4  =  0.2  (d  »  0.58).  f  =  0.4 
(d  w  0.68). 


(define  (MSetDist  c  naxiter  R  overflos) 

(let  ((zorbit  (nake-vector  (+  nax- 
iter  1)  0.0))) 

(define  (orbit-loop  i  z) 

(cond  ((>1  naxiter) 

0.0) 

((>  (magnitude  z)  R) 

(der-loop  1  i  O.O+O.Oi)) 

(else 

(let  ((nevz  (+  (*  z  z)  c))) 
(vector-set!  zorbit  i  neez) 
(orbit-loop  (+  i  1)  neez))))) 
(define  (der-loop  i  iter  zder) 

(let  ((z  (vector-ref  zorbit  (-  i  1)))) 
(cond  ((>  (aagnitude  zder)  overfloo) 
-1) 

((»  i  iter) 

(let  ((m  (magnitude  z))) 

(/  (•  2  (•»  (log  m) ) ) 
(magnitude  zder)))) 

(else 

(der- 

loop  (+  i  1)  iter  (+  (»  2  (•  z  zder))  1)))))) 
(orbit-loop  0  O.O+O.Oi))) 

Figure  27:  A  program  estimating  the  distance  of  a  point 
c  in  the  parameter  plane  from  M . 


algorithm.  After  having  obtained  a  better  estimate  for  p 
(=  A)  from  the  global  algorithm,  we  may  feed  that  esti¬ 
mate  into  optimal -control  to  gel  a  l)etter  estimate  for 
the  optimal  control  value  and  repeat  the  whole  process, 
until  we  get  satisfactory  accuracy. 

6  Cooperation  between  local  and  global 
approaches 

We  know  that  the  dissimilarity  function  is  locally  uni- 
modal  but  we  have  to  have  a  way  of  entering  the  region 
of  iinimodality  in  order  for  the  global  algorithm  to  work. 

One  approach  is  to  look  at  the  domain  D  at  different 
cell  resolutions.  In  other  words  increase  the  number  of 
cells  u.sed  to  rej>resent  the  global  function.  As  we  get  in 
coarser  and  coarser  resolutions,  the  region  of  nnimodal- 
ity  starts  at  bigger  and  bigger  values.  A  more  interesting 
solution  to  this  problem  is  to  use  a  combination  of  lo¬ 
cal  and  global  approaches.  In  particular,  we  can  use  llie 
local  algorithms  to  obtain  an  estimate  of  the  parameter 
that  places  us  in  a  region  of  untmoeialtly  of  the  disstmi- 
lartly  function,  and  then  use  a  global  approach  to  :rro  in 
to  the  correct  parameter  value.  Numerical  experiments 
.seem  to  indicate  that  local  methods  we  have  tried,  reach 
rather  quickly  an  good  estimate  of  the  parameter  lull 
then  get  stuck  and  do  not  seem  to  improve  further  (prob¬ 
ably  because  of  numerical  problems,  invalid  linearization 
a.s.sumptions.  or  bad  finite  ap|>roximation  a.s.sumptions). 
The  global  algorithm,  given  the  estimate  obtained  liy 
the  local  method,  increases  it  by  many  orders  of  mag¬ 
nitude.  It  seems  that  a  cooperation  between  local  and 
global  approaches  is  the  right  way  to  attack  real  esi  ima- 
_  tion  problems. 


(d«iin«  (optimal-control  p  tol  ql  q2) 

(define  (loop  ql  q2  dl  d2 
(let*  ((q  (/  (+  ql  q2)  2)) 

(c  (nake-rectangular  p  q)) 

(d  (nsetdist  c  1000  1000  lelOO))) 
(cond  ((<  dl  tol) 
ql) 

((«  d  0.0) 

(loop  ql  q 

dl  d)) 

(else 

(loop  q  q2 

d  d2))))) 

(loop  ql  q2 

(nsetdist  (nake- 

rectangular  p  ql)  1000  1000  lelOO) 

(nsetdist  (nake- 

recteuigular  p  q2)  1000  1000  lelOO))) 

Figure  28:  A  program  returning  an  estimate  of  an  opti¬ 
mal  control  value  given  an  estimate  p  for  A  and  two 
points  ql  not  in  M,  and  q2  inside  M. 

7  Final  Comments 

In  this  paper,  we  propose  a  new  approach  to  parame¬ 
ter  estimation  based  on  exploiting  the  global  geometri¬ 
cal  complexity  of  nonlinear  dynamical  systems,  rather 
than  trying  to  do  local  approximations  as  the  classical 
algorithms  do. 

We  demonstrate  the  power  of  a  global  approach  in  the 
context  of  complex  analytic  dynamics.  Under  very  rea¬ 
sonable  magnification  and  noise  assumptions,  and  with  a 
careful  combination  of  global  estimation  and  control,  we 
reach  an  improvement  as  big  as  13  orders  of  magnitude 
over  the  initial  estimate. 

The  approach  which  we  follow  in  the  case  of  complex 
analytic  dynamics  can  be  extended  to  much  more  general 
settings. 

We  remark  that  the  choice  for  global  functions  that  we 
have  made  is  just  one  of  the  many  possibilities.  We  have 
just  given  one  iiupleuieutatiou  of  the  luuch  more 
general  idea  of  global  parameter  estimation.  In 
different  settings,  we  are  forced  to  chose  different  global 
functions  or  minimization  algorithms.  For  example,  in 
the  case  of  Hamiltonian  dynamical  systems  no  attractors 
exist  and  completely  different  global  functions  must  be 
devised  (for  example  functions  that  reflect  the  shape  of 
chaotic  layers). 

The  global  approach  is  computationally  much  more 
demanding  than  the  local  approaches  but  can  be  much 
more  accurate  and  can  have  more  robust  convergence 
properties.  With  the  use  of  increasingly  faster  computers 
the  disadvantage  mentioned  above  becomes  less  and  less 
important.  Moreover,  the  global  approach  is  ideally 
suited  for  parallel  computation,  opening  the  door 
to  tremendous  icrease  in  performance. 

There  are  many  new  directions  to  follow:  first  of  all  the 
power  law  that  the  dissimilarity  function  locally  obeys 
needs  to  be  understood  thoroughly  and  analyzed  the¬ 
oretically.  Many  interesting  open  questions  concerning 
the  nature  and  behavior  of  the  parameter  sensitivity  ex¬ 


ponent  7  also  arise.  To  our  knowledge  these  results  are 
completely  new. 

Moreover,  we  believe  that  it  would  be  interesting  to 
develop  a  general  theory  of  optimal  global  geometric  pa¬ 
rameter  estimation  through  control,  and  investigate  how 
p.s.e.  changes  with  the  position  of  the  parameter  in  |)a- 
rameter  space. 

The  ultimate  goal  is  to  use  chaos  and  instability  and 
combine  the  local  and  global  approaches  to  parameter 
estimation  in  order  to  obtain  breakthrough  extraordi¬ 
narily  precise  measurements  of  quantities  that  are  very 
difficult  to  measure,  such  as  the  Universal  Constant  of 
(iraviiy. 
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Appendix 

A  The  Golden  Section  Method 

The  Golden  .Section  Method  uses  the  fact  that  we  can 
bracket  the  location  of  the  minimum  of  a  unimodal  func¬ 
tion  by  evaluating  the  function  at  two  distinct  points  in 
the  region  L  of  unimodality. 

To  describe  how  it  works,  we  first  assume  that  the 
function  is  unimodal  in  the  interval  L]  =  {/i ,  ri].  Sup¬ 
pose  we  evaluate  the  function  at  two  points  xi.xo  in  L 
such  that  T\  <  12  and  find  that  /(xj )  <  /(xt).  It  follows 
from  the  definition  of  unimodality  that  A  €  [/i ,  X2]  Simi¬ 
larly,  if  /(xi )  >  /(X2),  then  we  must  have  A  6  [xj ,  rj].  If 
the  function  values  at  xi,X2  are  equal  then  A  €  [xi,X2], 
but  for  simplicity  we  may  again  consider  that  it  belongs 
to  any  one  of  the  above  bigger  intervals.  In  any  case,  af¬ 
ter  the  first  two  function  evaluations,  a  portion  of  Lj  to 
the  right  of  X2  or  to  the  left  of  xi  can  be  eliminated  from 
further  search.  If  Ln  =  [/2.»’2]  is  the  remaining  interval, 
we  can  obtain  two  more  function  evaluations  and  fur¬ 
ther  reduce  the  length  of  the  interval  containing  A.  By- 
using  this  procedure  we  can  keep  reducing  the  unimodal¬ 
ity  interval,  obtaining  an  increasingly  tighter  brarketing 
of  the  minimum  value. 

An  improved  version  of  the  above  naive  algorithm  is 
the  Fibonacci  method,  which  gets  its  name  from  using 
the  Fibonacci  sequence 

(06) 

Tq  =  0.  Ti  =  I,  Tk  =  J't-i  +  k  =  '2.'].... 

in  picking  the  points  at  which  the  function  is  evaluated. 
The  method  works  as  follows:  Let  N  be  the  total  number 
of  points  at  which  the  function  will  be  evaluated.  .Sup¬ 
pose  that  at  iteration  k,  the  interval  containing  A  (the 


local  minimum)  is  [/*, r*].  For  k  =  1,2,. . .  ,N  —  I,  the 
function  values  are  computed  at  the  two  points 

(57)  xt  =  /t+ 

>N  +  2-t 

(58)  x^  =  4  +  f^'~^(rt-4) 

■XW+2-t 


We  notice  that  (by  the  definition  of  the  Fibonacci  se¬ 
quence)  one  of  the  points  xf.x*  is  the  same  as  one  of 
the  points  at  a  previous  iteration.  Hence,  only  one  new 
function  evaluation  is  required  at  each  point.  This  is  ex¬ 
tremely  important  in  our  case  where  function  evaluations 
are  computationally  very  expensive. 

One  of  the  disadvantages  of  the  Fibonacci  method 
is  that  the  number  of  function  evaluations  N  must  be 
known  in  advance,  (jetting  rid  of  this  requirement  leads 
to  a  method  known  as  the  Golden  seclton  method,  which 
is  a  good  approximation  to  the  Fibonacci  search.  It  can 
be  shown  that 


(59) 


lim 


\  _ 

T  ~  2 


Tf;  T 

The  golden  section  method  then  places  the  points  at 
which  the  function  is  to  be  evaluated  at 


(60)  =4  +  ^(rt-4) 

(61)  x*=4  +  ;(r*-4) 


Again,  only  one  function  evaluation  is  required. 
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